The relationship between imputation error and statistical power in genetic association studies in diverse populations.

نویسندگان

  • Lucy Huang
  • Chaolong Wang
  • Noah A Rosenberg
چکیده

Genotype-imputation methods provide an essential technique for high-resolution genome-wide association (GWA) studies with millions of single-nucleotide polymorphisms. For optimal design and interpretation of imputation-based GWA studies, it is important to understand the connection between imputation error and power to detect associations at imputed markers. Here, using a 2x3 chi-square test, we describe a relationship between genotype-imputation error rates and the sample-size inflation required for achieving statistical power at an imputed marker equal to that obtained if genotypes at the marker were known with certainty. Surprisingly, typical imputation error rates (approximately 2%-6%) lead to a large increase in the required sample size (approximately 10%-60%), and in some African populations whose genotypes are particularly difficult to impute, the required sample-size increase is as high as approximately 30%-150%. In most populations, each 1% increase in imputation error leads to an increase of approximately 5%-13% in the sample size required for maintaining power. These results imply that in GWA sample-size calculations investigators will need to account for a potentially considerable loss of power from even low levels of imputation error and that development of additional genomic resources that decrease imputation error will translate into substantial reduction in the sample sizes needed for imputation-based detection of the variants that underlie complex human diseases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The congruence between matrilineal genetic (mtDNA) and geographic diversity of Iranians and the territorial populations

Objective(s):From the ancient era, emergence of Agriculture in the connecting region of Mesopotamia and the Iranian plateau at the foothills of the Zagros Mountains, made Iranian gene pool as an important source of populating the region. It has differentiated the population spread and different language groups. In order to trace the maternal genetic affinity between Iranians and other populatio...

متن کامل

Estimation of genotype imputation accuracy using reference populations with varying degrees of relationship and marker density panel

Genotype imputation from low-density to high-density (SNP) chips is an important step before applying genomic selection, because denser chips can provide more reliable genomic predictions. In the current research, the accuracy of genotype imputation from low and moderate-density panels (5K and 50K) to high-density panels in the purebred and crossbred populations was assessed. The simulated popu...

متن کامل

Effect of Reference Population Size and Imputation Methods on the Accuracy of Imputation in Pure and Mixed Populations

    Imputation as a method of creating low-density chips to high-density chips has been introduced to increase the accuracy of genomic selection in animals. In the current study, to investing imputation accuracy, three populations of mixed (scenario 1), pure (scenario 2) and mixed + pure (scenario 3) were simulated using QMSim. Two methods of imputation including Beagle and Flmpute were used fo...

متن کامل

بررسی رابطه ژنی اختلال خلقی دو قطبی نوع یک با بتاتالاسمی مینور

Background and purpose: Previous studies indicate an important genetic factor in the etiology of β-Thalassemia and bipolar mood disorder. There has been several case reports implicating a possible association between the two conditions. But the results of a cross sectional study was not reconfirming. Regarding different patterns of mutations in different geographical areas, this study was per...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • American journal of human genetics

دوره 85 5  شماره 

صفحات  -

تاریخ انتشار 2009